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Abstract 

This paper focuses on two desired properties of cell-based switches for digital data net- 
works: (1) data cells should not be detained inside the switch any longer than necessary 
(the work-conserving property) and (2) data cells that have been in the switch longer 
(older cells) should have priority over younger cells (the order-conserving property). A 
well-known, but expensive design of a work- and order-conserving switch is the output- 
queued switch. 

A different switch design is the speedup crossbar switch, in which input buffers are con- 
nected to output buffers through a crossbar that runs at a multiple (called the speedup) of 
the external cell rate. A matching algorithm determines which cells are forwarded 
through the crossbar at any given time. Previous work has proposed a matching algo- 
rithm called the lowest output occupancy first algorithm (LOOFA). It is known that a 
LOOFA switch with speedup at least 2 is work-conserving. 

We propose a refinement of LOOFA called the lowest output occupancy and timestamp 
first algorithm (LOOTFA). The main result of this paper is that a LOOTFA crossbar 
switch is work- and order-conserving provided that the speedup is at least 3. We prove 
this result and consider some generalizations. 
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1. Introduction 



A cell-based switch processes fixed-sized chunks of data called cells, which arrive at 
switch inputs, pass through the switch proper, and depart from switch outputs. Each cell 
contains an identification of the single output to which it is destined. For convenience, 
we assume that the switch has the same number, N, of inputs and outputs and we assume 
that each input and output has the same capacity in cells per second. This capacity is 
called the cell rate, and its reciprocal, the cell time. We assume that all activities of the 
switch are synchronized to slots, each of which lasts one cell time. Figure 1 illustrates a 
cell-based switch. 
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Figure 1: An NxN cell-based switch. 

Although any realistic implementation would make extensive use of pipelining, for 
convenience we model the activity in the switch during each slot as a sequence of phases: 
an inhale phase, during which at most one cell from each input is accepted into the 
switch; a number of transfer phases, during which cells move around inside the switch; 
and an exhale phase, during which the switch emits at most one cell onto each output. 
See Figure 2. "Accepting" a cell during the inhale phase can be considered as the book- 
keeping necessary to account for a cell that arrived during the previous slot, and "emit- 
ting" a cell during the exhale phase can be considered as the bookkeeping necessary to 
account for a cell that will depart during the following slot. These bookkeeping activities 
are covered by the pipeline delay and take no real time in an implementation. 

The switch must contain buffer memory to hold temporary excesses of cells that result 
from short-term fluctuations in the arrival rate of cells destined to a given output. For 
example, multiple cells destined for the same output could be inhaled into the switch 
during the same slot, and the switch would have to hold these cells while the output ex- 
haled them one by one. Mechanisms to prevent buffer overflow such as flow-control 
back-pressure or rate reservation are important but beyond the scope of this paper. We 
also ignore the rate- or phase-matching buffer at each input that is typically used to bring 
arriving cells into synchrony with the slot time of the switch. 

In this paper we focus on two desired behaviors of a cell-based switch: (1) cells should 
not needlessly sit in buffers and (2) cells that have been in the switch longer {older cells) 
should have priority over younger cells. 
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Figure 2: Model of the activities in a switch during a slot. 

The latency of a cell is the number of slot boundaries between its inhale and its exhale. 
The first desired behavior can be stated formally as: the total latency over all cells is as 
small as possible. This is equivalent to the condition that each output always exhales 
some cell whenever there are any cells in the switch destined for that output. A switch 
that behaves in this manner is called work-conserving. 

Whenever the switch contains multiple cells destined to the same output, the total la- 
tency is unaffected by the order in which the cells are exhaled. Given the choice, it seems 
good to give older cells priority over younger cells. Stated formally, we desire that each 
time an output exhales a cell, there are no older cells in the switch destined for that out- 
put. A switch that behaves in this manner is called order-conserving. In Section 5.3 we 
revisit the notion of "order-conserving" in a more general context. 

A cell-based switch that is both work- and order-conserving should rightly be called 
ideal, but a more common term is the eponymous output- queued. To avoid confusion we 
refer to the behavior as ideal and the well-known implementation, described in the next 
paragraph, as output-queued. 

The well-known implementation of an ideal cell-based switch is the output-queued 
switch, in which the switch takes cells directly into buffers local to each output, as shown 
in Figure 3. Assuming each non-empty output unit always exhales one of its oldest cells, 
this design is clearly work- and order-conserving, hence ideal. Unfortunately it also is 
expensive. Because all inputs could simultaneously inhale cells destined to the same out- 
put, the connection into each output unit must have a capacity of N times the cell rate: 
either N times wider (as in Figure 3), N times faster, or some combination. None of these 
alternatives scales well as N increases. 
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Figure 3: An NxN output-queued switch. 

Another cell-based switch design is the crossbar speedup switch, which is illustrated in 
Figure 4. This switch contains input units, output units, and a crossbar interconnect. 
Cells are buffered at the input units and at the output units. The actions during each slot 
consist of an inhale phase, S (the speedup) transfer phases, and an exhale phase. During 
the inhale phase, each input unit inhales at most one cell and buffers it. During each 
transfer phase, the crossbar moves cells from input units to output units, subject to the 
restrictions that no more than one cell can be removed from any input unit and no more 
than one cell can be delivered to any output unit. During the exhale phase, each output 
unit removes at most one cell from its buffer and exhales it. 
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Figure 4: A crossbar speedup switch. 

Since each connection between the crossbar and an input or output unit is required to 
transfer at most one cell per transfer phase, of which there are S per slot, each such con- 
nection requires a bandwidth of only S times the cell rate. 

Each transfer phase proceeds in two parts: first a matching algorithm selects which 
cells in the input units to transfer (the match), and then the selected cells are transferred. 
We say that the cells in the input units compete for inclusion in the match. No pair of in- 
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eluded cells can conflict, either by sharing the same input (which would be an input con- 
flict) or sharing the same output (which would be an output conflict). The matching algo- 
rithm typically produces a maximal match, in which no additional cell can be included 
because each non-included cell has a conflict with some included cell. Since exactly the 
included cells are transferred, we also call them the transferred cells. 

In the types of crossbar speedup switch we investigate, some ordering of cells is used 
to determine which cells are more important and thus win the competition. Different 
matching algorithms use different orderings. 

Typically, each input unit buffers its cells in a separate queue for each output, as shown 
in Figure 5. Although illustrated as separate queues, a linked-list implementation is typi- 
cal, and the usual name for these structures is virtual output queues. This design requires 
that the oldest cell in each queue always be a most important cell in that queue. Hence 
the oldest cell can always be included in a match in preference to any younger cell in its 
queue, and in fact the younger cells need not even be considered. 



input units output units 




Figure 5: A crossbar speedup switch with (virtual) output queues. 

If the matching algorithm can be designed so that for each output, some cell destined to 
that output (if any exist) is always present in the output unit at the beginning of the exhale 
phase, then the crossbar speedup switch will be work-conserving. Krishna et al. [1] have 
developed a matching algorithm called the lowest output occupancy first algorithm 
(LOOFA) that achieves this property provided that the speedup S is at least 2. The occu- 
pancy of an output is the number of cells currently buffered in the output unit. In 
LOOFA, a cell destined to an output with lower occupancy is more important than a cell 
destined to an output with higher occupancy. Intuitively, an output unit containing fewer 
cells will need another cell sooner than an output unit containing more cells and hence 
cells destined to the lower occupancy output should be more important. 

If the matching algorithm can be designed so that for each output, an oldest cell des- 
tined to that output (if any exist) is always present in the output unit at the beginning of 
the exhale phase, then the crossbar speedup switch will be order-conserving in addition to 
being work-conserving — that is, it will be ideal. Prabhakar and McKeown [2] have de- 
veloped a matching algorithm called the most urgent cell first algorithm (MUCFA) that 
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achieves this property provided that the speedup S is at least 4. In their design, the switch 
schedules an exhale slot to each cell as it is inhaled, using the next available (not-yet- 
scheduled) exhale slot for the cell's destined output. Lower-numbered inputs get priority 
when the switch simultaneously inhales multiple cells destined to the same output. A 
cell's urgency is the number of slot boundaries remaining until its scheduled exhale. In 
MUCFA, a cell with lower urgency is more important than a cell with higher urgency. 
Clearly such a switch is ideal if it exhales each cell when its urgency is zero. 

Both LOOFA and MUCFA use matching algorithms that guarantee that each non-in- 
cluded cell has a conflict with some included cell that is at least as important, according 
to their respective definitions of importance, as the non-included cell. As a consequence, 
their matches are maximal. 

Since LOOFA takes no account of cells' ages, there is clearly no guarantee that it is or- 
der-conserving. However, the slight modification of resolving ties in output occupancy 
by favoring older cells produces an ideal switch provided that the speedup S is at least 3. 
We call this refinement the lowest output occupancy and timestamp first algorithm 
(LOOTFA). The fact that a LOOTFA switch with S > 3 is ideal is our main result. 

2. Formal model of a crossbar speedup switch 

In this section we present our notation and a formal model of a crossbar speedup switch. 
The formal model defines the state of the switch and the allowable changes in this state 
that can happen during each phase. In a LOOFTA switch, the matcher and the output se- 
lectors further constrain the behavior. In any specific execution history, the sequence of 
input data also constrains the behavior. 

The formal model has two parameters, ./V and S: 

N the number of inputs of the switch; also the number of outputs 
S the crossbar speedup factor 

2.1. Slot structure 

Time is divided into slots. Each slot consists of an inhale phase, S transfer phases, and an 
exhale phase. We label phase boundaries with consecutive integers starting with 0. The 
phase beginning at boundary b is called phase b. See Figure 6. 

In Section 5.2 we consider a more general phase arrangement. 
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Figure 6: Example slot structure and phase boundary labels (5=3). 

2.2. Basic notational conventions 

We use the following notational conventions: 

i an input, 1 < i < N 

o an output, 1 < o < N 

h (the beginning of) an inhale phase 

x (the beginning of) a transfer phase 

e (the beginning of) an exhale phase 

b (the beginning of) any phase 

c a cell 

i(c) cell c's input 

o(c) cell c's destined output 

h(c) cell c's inhalation phase: c is inhaled during phase h(c) 

2.3. State variables 

The model has the following state variables: 

IB b the set of cells in any input unit at time b 
OB b the set of cells in any output unit at time b 



2.4. Cell input or output subset notation 

Given an arbitrary set C of cells, we use the following subscript notation for identifying 
subsets consisting of those cells with a given input or output (regardless of whether the 
cells are present in the switch at any given time): 

C i=i = {ce C : i(c) = i} those cells in C with input i 

C l56j ={ce C : i(c)^ i} those cells in C not with input i 

C o=0 = {c g C : o(c)= o] those cells in C destined to output o 

C 0 ^ 0 = {c e C : o(c)^ o] those cells in C not destined to output o 
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Here are three examples of this notation: 

IB bi=i ={ce IB b : i(c)= /} cells in input unit i at time b 

IB bi=io=a ={ce IB h : i(c)= i a o(c)= o} cells in IB hi=j destined to output o 

OB bo=0 ={ce OB h : o(c)=o} cells in output unit o at time b 

2.5. Conflict notation 

Cells that share an input or an output are in conflict and cannot both be transferred in the 
same phase. We use the following notation for the relation of two cells in conflict: 

q ~ c 2 = z (c , ) = i(c 2 )v o (c, ) = o(c 2 ) input or output conflict 



2.6. Cell ordering notation 

We distinguish different cell orderings using subscripts: 

q < c 2 c l precedes (is more important than) c 2 according to ordering y 

c l < z c 2 Cj precedes c 2 according to z 

c i = z c 2 c i ti es c 2 according to z 

c x < z c 2 Cj precedes or ties c 2 according to z 

In all of the orderings we use in this paper, two cells tie if and only if neither precedes the 
other, and furthermore, as suggested by our notation, tying is an equivalence relation. We 
use the notation < y>z to designate the ordering derived from < y with ties broken by < z : 

q < ytZ c 2 = q < y c 2 v (q = c 2 a c, < z c 2 ) precedes according to y then z 
q = c 2 = q = c 2 a q = z c 2 ties according to y then z 

Next we give the initial state of the switch and the allowable changes in the state during 
inhale, transfer, and exhale phases. 



2.7. The initial state 

Initially there are no cells in the switch. 

|/B 0 | = 0 the input buffer initially is empty 

\OB 0 \ = 0 the output buffer initially is empty 
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2.8. An inhale phase 

For any inhale phase b, there exists a set of inhaled cells H h such that: 

OB h+l = OB h the output buffer does not change 

IB h+l = IB b u H b inhaled cells arrive in the input buffer 

Vz : \H b i=j | < 1 each input inhales at most one cell 

Vc e H b :h(c)=b inhalation time is correct 

2.9. A transfer phase 

For any transfer phase b, there exists a set of transferred cells X b such that: 

X b c: IB b transfer a subset of the input buffer 

IB b+l = IB b - X b transferred cells depart from the input buffer 

OB b+1 = OB b u X b transferred cells arrive in the output buffer 

Vz' : \X b 1=i | < 1 at most one transferred cell for each input 

Vo : \X b o=0 1 < 1 at most one transferred cell for each output 

The set of transferred cells X b is the set of cells included in the matching for phase b. In 
a LOOTFA switch, X b also satisfies an additional condition given in Section 3.5. 

2.10. An exhale phase 

For any exhale phase b, there exists a set of exhaled cells E b such that: 

IB b+l = IB b the input buffer does not change 

E b c= OB b exhale a subset of the output buffer 

OB b+l = OB b - E b exhaled cells depart from the output buffer 

Vo : \E b o=0 1 < 1 each output exhales at most one cell 

In a LOOTFA switch, E b also satisfies additional conditions given in Section 3.6. 

3. The LOOTFA switch 

In this section we present the additional conditions that a crossbar speedup switch must 
satisfy in order to be a LOOTFA switch and we develop concepts specific to the 
LOOTFA switch. 
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3.1. Output occupancy, oo b 

We define the output occupancy oo b (c) of a cell c at time b as the number of cells in c's 
destined output unit at time b. Formally, 

oo h {c)=\OB ho=o[c) \. 

3.2. Output occupancy ordering, < QO(b) 

Given any two cells q, c 2 , we say thatq precedes c 2 according to the output occupancy 
ordering at time b, written q < on ^ b ) c 2 , iff at time b, the output occupancy of q is less 
than the output occupancy of c 2 . Formally, 

c i <oo( b ) c 2 =oo h {c i )<oo h {c 2 ). 

3.3. Timestamp ordering, < t 

Given any two cells q , c 2 , we say that c l precedes c 2 according to the timestamp or- 
dering, written q < t c 2 , if and only if q is inhaled before c 2 . Formally, 

q <, c 2 =h(c 1 )<h(c 2 ). 

The timestamp ordering indicates which cells are older than others. In Section 5.3 we 
consider alternative definitions of the timestamp ordering. 

3.4. Transfer time ordering, < x 

Given any two cells q , q, , we say that q precedes c 2 according to the basic transfer 

time ordering, written q < bx c 2 , if and only if q is transferred before c 2 . We consider 

that a cell that is actually transferred is "transferred before" a cell that is never transferred. 
Formally, 

c i < bx c i = 3x, : q e a ((3x 2 : c 2 e a x x < x 2 )v ^(3x 2 : c 2 e X X2 )). 

We resolve ties in < hx arbitrarily to produce the total ordering < t , called the transfer time 
ordering. 

Note that the transfer time ordering is a property of an execution history of the switch, 
and is not in general available from the switch state at any moment in time. The transfer 
time is not used in the implementation of the switch, but only in our analysis of its be- 
havior. We use < x in the definition of the least important relevant cell in Section 4.5. 

The oracular nature of < x enables us to pick the cell that an execution history in fact 
treats as less important in the event of a tie in the matching condition. 

3.5. The LOOTFA matching condition and w(b) 

Like LOOFA and MUCFA, in each transfer phase LOOTFA requires that each non-in- 
cluded cell have a conflict with some included cell that is at least as important. Roughly 
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speaking, LOOTFA uses a definition of importance that favors cells with lower output 
occupancies, breaking ties in favor of cells with earlier timestamps. 

A subtlety arises at this point. Whereas a cell's timestamp never changes, a cell's out- 
put occupancy can change over time. In particular, after any transfer phase, the relative 
output occupancies of the cells surviving in the input buffer may be different from what 
they were at the beginning of the phase. Since rapidly constructing a match is crucial to 
the performance of the switch, an implementation would most likely pipeline this process 
as much as possible. Reevaluating the relative importance of surviving cells on every 
transfer phase seems like it would be bothersome. 

It turns out to be sufficient for the transfer phase to construct its matching based on 
output occupancies as they were at the end of the most recent inhale phase. This has the 
consequence that the relative importance of surviving cells does not change during the 
transfer phases in the same slot, which seems like a property that could be exploited in a 
pipelined implementation. 

We define the function w(b) of time b as the time at the end of the most recent inhale 
phase before b. Formally, 

0 ifb = 0 

w(b)= • b if phase b - 1 is an inhale phase 

w(b - 1) otherwise 

In Section 5.1 we consider alternative definitions of w. 

(Note that since the inhale phase does not affect output occupancies, we could equiva- 
lently use the "initial" output occupancies as of the beginning of the current slot. Krishna 
et al. [1] discovered that all of the transfer phases in the same slot could use initial output 
occupancies when they proved that an S > 2 LOOFA switch was work-conserving.) 

Now we can define the LOOTFA matching condition. For every transfer phase b, a 
LOOTFA switch satisfies the following condition in addition to the transfer phase condi- 
tions in Section 2.9: 

Vc e IB h - X h : 3c e X b : c ~ c a c < ooMb)l , c . 

That is, for each cell c in the input buffer that is not included in the match, there exists 
some conflicting, included cell c that is at least as important as c, where a cell is more 
important than another if it has a lower output occupancy at time w(b) or, in the event of 
a tie, if it has an earlier timestamp. Since c is transferred while c remains in the input 
buffer, we necessarily have c < 0 „( vv ( fo )) >M - c . We say that c is transferred in preference to 

c. 
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3.6. The LOOTFA exhale conditions 

For every exhale phase b, a LOOTFA switch satisfies the following conditions in addition 
to the exhale phase conditions in Section 2.10: 

Vo : \OB b o=o | > 0 => \E h o=o | > 0 OB work-conserving 

Vc g E b : Vc'e OB ba=Q ,x :c< t c OB order-conserving 

That is, each non-empty output o always exhales a cell, and the cell it exhales precedes or 
ties according to the timestamp ordering all cells in the output buffer destined to o. 

4. The LOOTFA theorem 

We now come to our main result. 

Theorem (LOOTFA): A LOOTFA switch with speedup S > 3 is ideal. 

The rest of Section 4 is devoted to a proof of this theorem. We assume an execution his- 
tory that is a counterexample, define a number of attributes (e,fc, Rb, h, lircb, OBT h , p b , H, 
X, and E) of this execution history, and finally arrive at a contradiction. 

4.1. Earliest failing exhale phase, e 

Recall from Section 1 that a switch is ideal if and only if it is both work-conserving and 
order-conserving. To be work-conserving, the switch must ensure that whenever there 
are any cells in the switch destined to output o at the beginning of an exhale phase b, out- 
put o exhales some cell during phase b. To be order-conserving, the switch must ensure 
that whenever an output o exhales some cell c, there are no cells in the switch destined to 
output o that precede c according to the timestamp ordering. 

Formally, a switch is ideal if, in every execution history, the following conditions both 
hold for every exhale phase b: 

Vo : \{lB h u OB h ) o=o | > 0 => \E ho=a | > 0 work-conserving 
Vc e E b : Vc'e (lB h u OB h ) 0=0 t c \ '■ c < c order-conserving 

We say that an exhale phase fails if it violates one or both of the above conditions. (For 
example, if at the beginning of an exhale phase b, a crossbar speedup switch has a cell 
destined to o in its input buffer but no cells destined to o in its output buffer, then exhale 
phase b is sure to fail.) 

In our assumed counterexample execution history, there must be some exhale phase 
that fails. We define e to be the earliest such failing exhale phase. 

4.2. The failing cell,/<? 

In order for exhale phase e to fail, there must be some cell c e (lB e u OB e ) in the switch 
such that either (1) no cell is exhaled on output o(c) (which would violate work-con- 
serving) or (2) a cell is exhaled on output o(c) that c precedes according to the time- 
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stamp ordering (which would violate order-conserving). We pick one such cell and call it 
fc , the, failing cell. 

We claim that at time e, cell/c must be in the input buffer and it must precede all cells 
in output o(fc) according to the timestamp ordering. This claim follows from the 
LOOTFA exhale conditions of Section 3.6. 

Formally, we first prove that Vce OB e o=0 ( fc y. fc < t c. Assuming the contrary, there 
exists a cell c e OB e o=0 ^ such that c< t fc . Then from the OB work-conserving condi- 
tion, output o(fc) must exhale some cell c , and from the OB order-conserving condi- 
tion, we have c ' < t c , whence c < fc . This contradicts the definition of fc, so our state- 
ment is proved. 

Since fc does not precede itself according to the timestamp ordering, fc cannot be in 
OB e o=0 ( fc ) , and therefore fc € OB e . By definition fc e (lB e u OB e ), so we have fc e IB e . 

This completes the proof of our claim. 
In summary, we have 

fc g IB e , and 

VceOB eo=o(Jc) :fc< t c. 

The rest of the proof proceeds as follows. We define a set of relevant cells, which are 
those cells sharing the same input as fc that contribute to allowing fc to survive in the 
input buffer until the earliest failing phase e. We define the least important relevant cell 
at time b and prove a property of its output occupancy. We examine the output buffer 
trailing cells, which are those cells in the output unit o(fc) that are preceded by fc ac- 
cording to the timestamp ordering. Then we define a potential at time b as a linear com- 
bination of various salient quantities in the switch state at time b. We establish a lower 
bound on the potential at the inhalation of the first relevant cell, push this bound forward 
phase by phase, and thus obtain a lower bound at time e. Finally we directly compute the 
potential at time e and obtain a value that violates the lower bound, thus showing a con- 
tradiction. 

4.3. Relevant cells, R 

We define a cell c to be relevant if: 

(1) c = /eor 

(2) c shares the same input as fc and is transferred in preference to some relevant cell 
during some transfer phase b < e . 

Recall from Section 3.5 that a cell c is said to be transferred "in preference to" a cell c 
during transfer phase b if and only if c is transferred, c survives in the input buffer, c and 
c conflict, and c is at least as important as c ; formally, 

ce X b Ac'e IB b -X h Ac~c'Ac < oo(w(h)lt c . 
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Intuitively, the relevant cells are fc and cells that, directly or indirectly, delay the 
transfer of fc by means of input conflicts. 

We define R as the set of all relevant cells. For any time b we define R h as the set all 
of relevant cells present in the input buffer at time b. Formally, 

R b =RnIB b . 

A transfer phase during which some relevant cell is transferred we call an R-transfer 
phase. A transfer phase during which no relevant cell is transferred we call a nonR- 
transfer phase. 

4.4. Earliest inhale of a relevant cell, h 

Each relevant cell R has an inhalation phase h(c). We define h to be the earliest in- 
halation phase of any relevant cell. Formally, 

h = min h(c). 

ceR 

Since R is non-empty ( fc e R ), h is well-defined. 

We claim that for any time b in the range h<b<e, we have |i? fe |>0. Clearly 
\R h+ i \ > 0, since the switch has just inhaled a relevant cell and has not yet had a chance to 
transfer it. An R-transfer phase b < e transfers a relevant cell ce R h , but since c cannot 
be fc (because fc is not transferred before e), c must be transferred in preference to 
some other relevant cell c'e R b , and consequently we have c'e R b+1 . No other phase can 
remove a relevant cell from the input buffer, so the claim is proved. 

4.5. Least important relevant cell, lirc b 

For any time b in the range h<b<e , we define the least important relevant cell lirc b at 
time b as the maximum element of R b according to < 00 ( w ( 6 )) i( >x . That is, 

lirc b e R b a Vc e R b : c < oo{w{b)W lirc b . 

Since \R b \ > 0 and < x is total, the least important relevant cell exists and is unique. Note 

that the least important relevant cell is defined in terms of the output occupancy ordering 
as it is at time w(b), which, not surprisingly, is the output occupancy ordering used in the 
LOOTFA matching condition. 

We now prove two useful lemmas about the least important relevant cell. Note that 
these lemmas relate to the assumed counterexample execution history with respect to 
which e, Rb, h, and lircb are defined. 

Lemma (lire survival): For any phase b in the range h<b<e , we have lirc h e R b+1 . 
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Proof: By definition lirc h e R b . If b is an inhale phase, an exhale phase, or a transfer 
phase that does not transfer lirc b , then lirc b survives in the input buffer at time b + 1 , and 
consequently lirc b e R b+1 . It remains to consider the case in which b is a transfer phase 
and lirc h e X h . In this case, we must have lirc h ^ fc , since fc is not transferred before 
e. From the definition of relevance, lirc b must be transferred in preference to some other 
relevant cell ce R h , which means that lirc b is at least as important as c, that is 
lirc h ^ 0 „( w ( b )i, c . Since lirc h is transferred before c, we have lirc b < x c . But this gives us 
lirc h < 0 „( w ( h )i, tX c , which contradicts the definition of lirc h . This completes the proof. 

Lemma (lire output occupancy): For any phase b in the range h<b<e , we have 
oo b+1 (Urc M )>oo b+1 (lirc b ). 

Proof: Intuitively, either the choice of lirc b+l is based on output occupancies at time 
b + 1 or else lirc h+l = lirc h . By definition, lirc b+1 is the maximum element of R b+i under 
< oo{w(b+i)) t x- Since we have lirc h & R b+l by the previous lemma, it follows that 
^ ooMh+l )), t ,Jirc b+l and hence lire b < oo{w{b+l)) lire b+1 . If w(b + l) = b + l then we are 
done. Otherwise, by the definition of w (see Section 3.5), w(b + l)=w(b) and phase b 
cannot inhale any cells. Since lirc b+l cannot have been inhaled during phase b, it must 
have been in the input buffer at time b, and consequently lirc b+l e R b . By definition, lirc b 
is the maximum element of R h under < 00 ( w ( b )) tx , so it follows that lirc b+l ^ 00 { w { b x\ tx lirc b . 
But w(b + l)= w(b), so we have lirc h+l ^ 00 ( lv ( 6+1 )), x lirc h . We now have lirc b and lirc b+l 
each at least as important as the other according to < 00 ( w ( fe+1 )) ( x ■ Since this ordering is to- 
tal, it follows that lirc b+l = lirc h and we are done. 

4.6. Output buffer trailing cells, OBT b 

For any time b in the range h<b<e , we define the output buffer trailing cells OBT b at 
time b as the set of those cells in output unit o(fc) that are preceded by fc according to 
the timestamp ordering. Formally, 

OBT b ={ceOB bo _ o{fc] :fc< t c}. 

4.7. Potential, p h 

For any time b in the range h<b<e , we define the potential p b at time b by the fol- 
lowing magic formula: 

P b - °°b ( lirc t )- \ 0BT h | - 2 ■ \ R b | • 
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We establish a lower bound on the potential at time h + 1 , analyze the changes in poten- 
tial with each phase, and show that the resulting lower bound on potential at time e con- 
tradicts the actual potential at time e. 

4.8. Lower bound on potential at time h+1 

To bound the potential at time /i + l we bound the components in its definition. 
oo h+l {lirc h+1 ) > 0 An output occupancy cannot be negative. 

\OBT h+l \ = 0 Consider any cell c in output unit o(fc) at time h + 1, that is, 

ce OB h+lo=0 (f C y Cell c must be transferred during some earlier 
transfer phase x(c)< h + 1 , and since phase h is an inhale phase, we 
have x(c)<h. Cell c must be inhaled before it is transferred, 
hence h(c)< x(c)< h . Since h is the inhalation time of the earliest 
relevant cell and fc is relevant, we have h < h(fc) and thus 
h(c)< x(c)< h(fc). Hence from the definition of the timestamp 
ordering we have c< t fc . So c is not an output buffer trailing cell. 

\R h+1 1 = 1 At time h + 1 the switch has just inhaled the earliest relevant cell. 

Combining the components, we have 

p h+l = oo h+l (lirc h+l )- \OBT h+l | - 2 ■ \R h+1 \ 

>-2. 

Next we consider the effects of each phase as b advances from h + 1 to e. 

4.9. Effect of an inhale phase 

To bound the change in potential during an inhale phase b, we bound the changes of the 
components. 

oo h+l (lirc h+l )> oo h (lirc b ) 

The output buffer is unchanged by an inhale phase, so we have 
oo h+1 (lirc b )= oo h (lirc h ). Combining this with the lire output occu- 
pancy lemma (Section 4.5) we get oo h+1 (lirc h+1 )> oo h+1 (lirc b )= 
oo h (lirc h ). 

\OBT b+l | = \OBT h | The output buffer is unchanged by an inhale phase. 
\R b+l \<\R b \ + l Input i(fc) can inhale at most one cell. 
Combining the components, we have 

P h+l = oo b+l (lirc b+l )-\OBT b+l \-2- \R M \ 
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>oo b {lirc b )-\OBT b \-2-i\R b \ + \) 
= oo b (lirc b )-\OBT b \-2-\R b \-2 
= Pb~ 2 - 

4.10. Effect of an R-transfer phase 

To bound the change in potential during an R-transfer phase b, we bound the changes of 
the components. 

oo h+l (Urc h+l )>oo h (lirc b ) 

No cells can depart the output buffer, so oo b+l (lirc b )> oo h (lirc b ). 
Combining this with the lire output occupancy lemma (Section 4.5) 
we get oo h+l (lirc b+l ) > oo b+l (lirc h ) > oo h (lirc b ) . 



pBT M <OBT b +l 



There might be a new output buffer trailing cell, but there can be at 
most one. 

\R b+ i | = \R b | - 1 Exactly one relevant cell is transferred. 
Combining the components, we have 

p h+l = oo h+l (lirc h+l )- \OBT b+1 1 - 2 ■ \R M \ 

>oo b {lirc b )-\OBT b \ + \)-2-i\R b \-\) 

= oo b {lirc b )-\OBT b \-2-\R b \ + \ 

= P b +l- 

4.11. Effect of a nonR-transfer phase 

To bound the change in potential during a nonR-transfer phase b, we bound the changes 
of the components. 

°°b + l ( lirc b + l ) ^ 00 b ( lirc b )+ 1 

Since lirc b is relevant, lirc h is not transferred during phase b. 
Therefore from the LOOTFA matching condition (Section 3.5) 
there must be some cell transferred in preference to lirc b . Since 

any cell transferred in preference to lirc h and sharing input 
i(lirc b )=i(fc) would by definition be relevant, and since no rele- 
vant cell is transferred during a nonR-transfer phase, there must be 
some cell transferred in preference to lirc h that shares output 

o(lirc h ) . Therefore oo b+l (lirc b ) = oo b (lirc b )+ 1 . Combining this 
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with the lire output occupancy lemma (Section 4.5) we get 
oo h+l (lirc h+l )> oo h+l (lirc h )= oo h (lirc h )+ 1 . 

|0fir fe+1 | = jO-ST^I Since fc is relevant, fc is not transferred during phase b. Therefore 

from the LOOTFA matching condition (Section 3.5) there must be 
some cell transferred in preference to fc. Since any cell transferred 
in preference to fc and sharing input i(fc) would by definition be 
relevant, and since no relevant cell is transferred during a nonR- 
transfer phase, there must be some cell transferred in preference to 
fc that shares output o(fc). Let c be such a cell. Since c is trans- 
ferred in preference to fc, we have c< 00 ^ b ^ t fc. Since 
o(c)= o(fc), we have c = 00 i w i b \) fc and hence c< t fc . Therefore c 

is not an output buffer trailing cell. Since at most one cell can be 
transferred to any given output during a single transfer phase, c is 
the only cell transferred to output o(fc) during phase b. So no 
output buffer trailing cells are transferred during phase b. 

\R b+x \ = \Rb\ No relevant cell is transferred. 

Combining the components, we have 

P h+l = oo h+l (lirc h+l )- \OBT h+l | - 2 ■ \R M \ 

>(oo h (lirc h )+l)-\OBT h \-2-\R h \ 

= oo h (lirc b )-\OBT b \-2-\R h \ + l 

4.12. Effect of an exhale phase 

To bound the change in potential during an exhale phase b, we bound the changes of the 
components. 

oo b+l (lirc b+l ) > oo b (lirc b )- 1 

Since output o(lirc b ) can exhale at most one cell, we have 
oo b+1 (lirc b )> oo b (lirc b )-l . Combining this with the lire output oc- 
cupancy lemma (Section 4.5) we get oo b+1 (lirc b+1 )> oo b+l (lirc b )> 
oo b {lirc b )-\. 

\OBT M \ = \OBT b \ Since b < e and exhale phase e is assumed to be the earliest phase 
in which the switch fails, output o(fc) cannot exhale any member 
of OBT h . 

\Rb+\ \ = \^b\ Th e input buffer is unchanged. 
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Combining the components, we have 

P h+l = oo h+l (lirc M )- \OBT h+l | - 2 ■ \R M \ 

>{oo b {lirc b )-\)-\OBT b \-2-\R b \ 

= oo b (lirc b )-\OBT b \-2-\R b \-l 

= P„-1- 

4.13. Lower bound on potential at time e 

To summarize the preceding sections, the effect on p h of the phases between time h + 1 
and time e is as follows: each inhale phase decreases p h by at most 2, each transfer phase 
(regardless of whether R-transfer or nonR-transfer) increases p b by at least 1, and each 
exhale phase decreases p b by at most 1. Let 

H = the number of inhale phases between time h + 1 and time e, 

X = the number of transfer phases between time h + 1 and time e, and 

E = the number of exhale phases between time h + 1 and time e. 

(Note that E does not include the failing phase, which starts at time e.) Then formally, 

Observe from the slot structure (Section 2.1) that in any interval starting at the end of an 
inhale phase and ending at the start of an exhale phase, there are more transfer phases 
than twice the number of inhale phases plus the number of exhale phases. In particular, 
we have 

X>2H+E. 

Combining the above with the lower bound p h+l > -2 (Section 4.8), we have 

P^P^-2-H + X-E 

>P h+ i 
>-2. 

4.14. The potential at time e 

Now let us compute p e directly. 

oo e (Hrc e )=oo e (fc) 

From the definition of R, each relevant cell except fc is transferred 
during some transfer phase b < e . Hence we have R e = { fc } and 

thus lirc e = fc . 
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\OBT e | = oo e (fc) Since by assumption exhale phase e fails, all cells in output unit 
o(fc) at time e must be preceded by fc according to < ao ( w ( e )) t , that 
is, they are all output buffer trailing cells. 

W = i R e ={fcl 

Combining the components, we have 
p e =oo e {Urc e )-\OBT e \-2\R e \ 

= oo e {fc)-oo e {fc)-2-\ 
= -2, 

which contradicts the lower bound obtained in Section 4.13. Hence the assumption of a 
counterexample arrives at a contradiction, and the LOOTFA theorem is proved. 

5. Generalizations 

For clarity, we have presented LOOTFA with a number of concrete assumptions that are 
not strictly required by our proof. In this section we show some generalizations. 

5.1. Generalized time of evaluation, w(b) 

When transfer phase b decides which cells to include in the match, it evaluates the im- 
portance of cells based on output occupancies as of time w(b). Our original definition of 
w (Section 3.5) causes each transfer phase to construct its match based on output occu- 
pancies as they are at the end of the most recent inhale phase. However, the only place in 
which we use properties of w is the proof of the lire output occupancy lemma in Section 
4.5, which succeeds if w satisfies the following conditions for each phase b: 

w(b + 1) = w(b) v w(b + 1)= b + 1 same as previous, or become current 

\lB h+l - IB h | > 0 => w{b + 1) = b + 1 become current if any inhaled cells 

The basic idea is that the switch maintains a record of output occupancies based on which 
it constructs the match for a transfer phase and, from time to time, the switch updates this 
record to the current output occupancies. A LOOTFA switch must update its output oc- 
cupancies at the end of every inhale phase that actually inhales a cell, and it can also up- 
date whenever convenient. 

For example, consider the definition w(b)=b, which satisfies the generalized condi- 
tions. Under this definition, every transfer phase constructs its match based on current 
output occupancies. As explained in Section 3.5, this would likely be bothersome to im- 
plement. However, the resulting switch would be ideal provided S > 3 . 
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5.2. Generalized phase arrangement 

Our original phase arrangement was a sequence of slots each consisting of an inhale 
phase, S transfer phases, and an exhale phase (Section 2.1). However, the only place in 
which we use properties of the phase arrangement is the counting argument in Section 
4.13. This argument requires that 

X > 2-H + E , 

for any interval starting at the end of an inhale phase and ending at the start of an exhale 
phase, where 

H = the number of inhale phases in the interval, 

X = the number of transfer phases in the interval, and 

E = the number of exhale phases in the interval. 

In addition to our original slot structure, there are many other ways of arranging phases 
that satisfy this condition. For example, we can arrange the phases into multislots, in 
which each phase of a slot repeats n times, as illustrated in Figure 7. 
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Figure 7: Example multislot structure (n=2, 5=3). 

Since all of the transfer phases in the same multislot are allowed to use the same output 
occupancies, a multislot implementation could probably be pipelined more extensively 
than a single slot design. This would achieve higher throughput at the cost of higher la- 
tency in real time due to increased discrepancy between real time and model time. 



5.3. Generalized timestamp ordering, < t 

Our original timestamp ordering given in Section 3.3 said that a cell inhaled before an- 
other must precede the other according to the timestamp ordering. However, the only 
place in which we use properties of the timestamp ordering is the lower bound calculation 
in Section 4.8, which succeeds if < ( satisfies the following timestamp condition for any 

cells Cj , c 2 and any transfer phase x: 

h{c^)< x <h(c 2 ) => Cj <, c 2 . 

That is, if the inhalation phases of two cells are separated by an intervening transfer 
phase, the earlier cell must precede or tie the later cell according to the timestamp order- 
ing. 
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The timestamp controls how a LOOTFA switch orders cells destined to the same out- 
put. Provided that the speedup S > 3 , during each exhale phase each output exhales a 
cell that precedes or ties, according to the timestamp order, all cells in the switch destined 
to that output (if any). That is, for every exhale phase b, we have 

Vo : u OB b \ =o | > 0 => \E h o=o | > 0 work-conserving 
Vc e E h : Vc'e (lB h u OB h \ =0 ^ : c < c "order-conserving" 

We could say that the timestamp ordering defines the meaning of "older" and "younger" 
and therefore the meaning of "order-conserving". Perhaps it would be better to call the 
condition timestamp-conserving. 

For example, our original timestamp ordering orders cells according to their inhalation 
phases. An output of an S > 3 LOOTFA switch using this timestamp exhales cells in 
order of inhalation phase, but an arbitrary order applies to cells inhaled during the same 
phase. 

For a second example, consider the timestamp ordering which orders cells according to 
their inhalation phases and breaks ties by ordering cells according to their input number. 
Since an input can inhale at most one cell per phase, this is a total order. An output of an 
S > 3 LOOTFA switch using this timestamp exhales cells strictly according to this total 
order. 

For a third example, consider a multislot switch (Section 5.2) in which the timestamp 
ordering orders cells according to their inhalation multislot, but arbitrarily orders cells 
that are inhaled during the repeated inhale phases of the same multislot. An output of an 
S > 3 LOOTFA switch using this timestamp exhales cells according to the timestamp 
ordering, which may or may not be useful. 

For a fourth example, consider the trivial timestamp ordering, in which all cells tie. 
The effect of such a definition is to remove from the LOOTFA switch all consideration of 
the age of a cell, thus reducing it to a LOOFA switch. In this case the order-conserving 
success condition is trivially satisfied and the only interesting property is work-conserv- 
ing. We can slightly modify our proof of the LOOTFA theorem to obtain a proof of the 
fact that an S > 2 LOOFA switch is work-conserving. Assuming a failing exhale phase 
e, using the same definitions of fc, R b , h, lirc b , H, X, and E, and using the modified po- 
tential p b at time b defined by 

P h =oo h {lirc h )-\R h \, 

we can show that 

p b+l > p b - 1 for each inhale phase b, 
p b+l > p b +l for each transfer phase b, 
p b+l > p b - 1 for each exhale phase b, 
P e * P h+ i - H + X - E > p h+l , and 
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p e =-h 



which is a contradiction, hence proving that there can be no exhale phase that fails. (Note 
that under a timestamp ordering in which all cells tie, there are never any output buffer 
trailing cells.) Krishna et al. [1] proved that an S > 2 LOOFA switch is work-conserv- 
ing. Our result here provides an alternate proof. 

6. Computing a LOOTFA match 

The obvious algorithm to compute a LOOTFA match is to repeatedly pick the most im- 
portant non-conflicted cell until no more cells can be picked. We call this the global 
minimum greedy algorithm. It turns out that if the timestamp is the same for all cells (as 
in the fourth example of Section 5.3), then the match can also be computed by visiting the 
input units in an arbitrary order picking the most important non-conflicted cell in each. 

6.1. The global minimum greedy algorithm 

Given two sets C, and C 2 of cells, we define the set Cj~ C 2 as those cells in C x that do 
not conflict with any cell in C 2 . Formally, 

= {c,g C, : — i3c 2 g C 2 : q ~ c 2 }. 

The global minimum greedy algorithm computes a LOOTFA match at time b with an it- 
erative sequence X bQ , X bl , X bz starting with X b 0 ={} and producing X hz+l from 

X h z by adding a most- important (that is, minimal) element of IB h j~ X b z according to 

< oo(w(b)),t ■ There may be ties, in which case the choice between the tied minimal elements 

is arbitrary. When IB b j~ X bz is empty, we declare that Z = z and the algorithm is done. 

The result X b is X bz . Each step is called a round. Observe that Z<N. 

Clearly any result of the global minimum greedy algorithm satisfies the formal model 
transfer phase requirements (Section 2.9), 

X b ^IB b , 

V/:|x fc . = ,.|<l,and 

Vo:|x t>0=0 |<l. 

To show that the result satisfies the LOOTFA matching condition (Section 3.5), 
Vc € IB b - X b : 3c' e X b : c ~ c a c < oo{w(b)ll c , 

we introduce the following invariant, which we claim holds at any round z: 
Vce IB b :(3ce X btt : c ~ c a c < oo(w(b)lt c)v {ce IB b /~ xj. 
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For round z-0, X b0 is empty, so lB b J~ X b0 = IB h and therefore ce IB b /~ X b0 for all 
ce IB h . Assuming the invariant holds at some round z < Z , we now show it holds at 
round z + 1 . Consider any cell c e IB h . We must show that 

(3c e Vi :c '~ CAC '%(* c ) v ( ce x fc, z+ i)- 
Case 1: Suppose 3c' € X bz : c' ~ c Ac ^ 00 ( w (p)i, c. Then we are done, because 

%b, z +i 2 X b z . 

Case 2: Otherwise, by the invariant for round z, we have ce /B 6 /~ X hz . Let c' be the 
unique cell in X bz+1 - X b z . 

Case 2a: Suppose c ~ c (which includes the case c - c). Because the global minimum 
greedy algorithm chooses a most-important not-yet-conflicted cell on each round, we 
have c< oo{w{b)lt c. Thus we have c'e X bz+l and c ~c and c < oo{w{b))t c, and we are 

done. 

Case 2b: On the other hand, if c does not conflict with c, we have ce IB b /~ X b z+i , and 
we are done. This completes the proof of the invariant. 

By the termination condition, IB b j~ X bz ={ }, the invariant at the end of the final 
round reduces to 

Vc e IB b : 3c e X bz :c'~cac' < oo[w[b)l , c , 

from which the LOOTFA matching condition follows immediately. 

It also turns out that any legal LOOTFA match X' b can be produced by some run of the 

global minimum greedy algorithm, by choosing the elements of X' b in non-decreasing 

order according to < 00 ^ w ( h )) t - Hence the possible results of the global minimum greedy 

algorithm are precisely the matches that are allowed by the definition of a LOOTFA 
switch. 

Note that for each pair of input i and output o an implementation can ignore any cells in 
IB b 1=i o=o except for any single minimal element according to < t . If the timestamps for 

cells of input i and output o advance monotonically with inhalation time (as in our initial 
LOOTFA timestamp ordering in Section 3.3), a virtual output queue will always have a 
most important cell at the front. 

6.2. The per-input minimum greedy algorithm 

We consider the case of the trivial timestamp ordering, in which all cells tie. This re- 
duces the LOOTFA switch to a LOOFA switch. Observe that if an included cell c has 
an output conflict with c, that is, o(c')=o(c), then we have oo b (c')= oo b (c) and hence 

c = 00 ( w (,,)) t c . Because of this property, a LOOFA match can be computed by the per-in- 
put minimum greedy algorithm, in which each round z consists of two parts: first choos- 
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ing an input i arbitrarily, subject to the constraint that IB b i=j /~ X bz is non-empty; and 
then producing X bz+l from X bz by adding a minimal element of IB bi=i /~ X bz according 

t0 < oo( W (b)\, ■ 

Regardless of the order in which the inputs are considered, the per-input minimum 
greedy algorithm produces a match that satisfies the LOOTFA matching condition. The 
proof uses the same invariant as used in the previous section, 

Vc e IB b : (Bc'e X Kz :c~cac < oo(w(h)lt c)v (c e IB b /~ X ha ). 

The only difference occurs in the proof within Case 2a that c is at least as important as c, 
that is, c < 00 / w / b \\ t c . If c and c share the same input, then the result follows from the 

fact that the per-input minimum greedy algorithm chooses a most-important not-yet-con- 
flicted cell from a chosen input. Otherwise, c and c must share the same output, from 
which it follows that c = 0 „( w ( h )i, c , as explained above. 

The per-input minimum greedy algorithm has a parallel implementation in which each 
input extends a bid to the destined output of its most important non-conflicted cell, and 
then each output that gets a bid accepts one and rejects the others. Additional bid-accept 
rounds handle rejected inputs. Although typically only a few rounds are necessary, in the 
worst case N rounds are required. Krishna et al. [1] use this implementation of the per- 
input minimum greedy algorithm. 

Unfortunately, the per-input minimum greedy algorithm does not work for a non-trivial 
timestamp ordering, which in general is the case in a LOOTFA switch. 
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